Teraspeech'2000 : a 10,000 speakers database

نویسندگان

  • Marie-José Caraty
  • Claude Montacié
چکیده

TeraSpeech is a bilingual database (i.e., English and French) developed in partnership with a French museum, le Musée des Sciences et de l’Industrie in Paris. A demonstration of vocal signature is the support of this data collection. Aiming at the validation of a quality plan, a scenario of the demonstration has been designed, and various protocols have been developed. The quality plan is presented as well as the solutions we found for its validation (i.e., scenario and protocols). The statistics of TeraSpeech are given. Three trends are examined for the perspectives : the validation, the exploitation and the research. Over a single year of the vocal signature exhibition, TeraSpeech’2000 is a collection of more than 30,000 sentences recorded from more than 10,000 visitors. The exposition on acoustics of the museum is planned for ten years. TeraSpeech is expected to be a collection of more than 100,000 speakers recorded over the same sound acquisition channel.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of the estonian speechdat-like database

A new database project has been launched in Estonia last year. It aims the collection of telephone speech from a large number of speakers for speech and speaker recognition purposes. Up to 2000 speakers are expected to participate in recordings. SpeechDat databases, especially Finnish SpeechDat, have been chosen as a prototype for the Estonian database. It means that principles of corpus design...

متن کامل

Issues in Design and Collection of Large Telephone Speech Corpus for Slovenian Language

In this paper, different issues in design, collection and evaluation of the large vocabulary telephone speech corpus of Slovenian language are discussed. The database is composed of three text corpora containing 1530 different sentences. It contains read speech of 82 speakers where each speaker read in average more than 200 sentences and 21 speakers read also the text passage of 90 sentences. T...

متن کامل

An automatic interpretation system for travel conversation

We have developed an automatic interpretation system running on a mobile PC that helps oral communication between Japanese and English speakers in various situations during their travel abroad. In order to allow a wide range of expressions and topics in the applied domain, we adopted an approach which utilizes the general linguistic knowledge as well as the domainspecific linguistic knowledge. ...

متن کامل

POLYCOST: A telephone-speech database for speaker recognition

This article presents an overview of the POLYCOST database dedicated to speaker recognition applications over the telephone network. The main characteristics of this database are: large mixed speech corpus size (> 100 speakers), English spoken by foreigners, mainly digits with some free speech, collected through international telephone lines, and more than eight sessions per speaker.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000